Discovering a domain-specific schema from general-purpose knowledge base

SILVA NETO, Everaldo Costa

Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.ufpe.br/handle/123456789/51840

Comparte esta pagina

Título :	Discovering a domain-specific schema from general-purpose knowledge base
Autor :	SILVA NETO, Everaldo Costa
Palabras clave :	Banco de dados; Descoberta de esquema; Descoberta do domínio; Representação de entidade
Fecha de publicación :	13-jun-2023
Editorial :	Universidade Federal de Pernambuco
Citación :	SILVA NETO, Everaldo Costa. Discovering a domain-specific schema from general-purpose knowledge base. 2023. Tese (Doutorado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2023.
Resumen :	General-purpose knowledge bases (KBs), e.g., DBpedia, YAGO, and Wikidata, store fac- tual data about a set of entities. These KBs have been constructed to store cross-domain knowledge, e.g., health, entertainment, industry, sports, and arts. Most applications that use data from general-purpose KBs are domain-specific. Some tasks, such as query formu- lation and information extraction, require a data schema to explore the contents of a KB. However, schema-related declarations are not mandatory and, sometimes, are not pro- vided. Therefore, these domain-specific applications face two issues: (1) they require only a subset of data that meets the domain of interest, but general-purpose KBs have a large volume of factual data within many distinct domains; and (2) the lack of schema-related information. In this thesis, we address the problem of domain-specific schema discov- ery from general-purpose KBs. Specifically, we build ANCHOR, an end-to-end pipeline to identify a domain-specific dataset as well as its schematic description in an automatic way. ANCHOR works in three steps: domain discovery, class identification and class schema discovery. First, it extracts a specific domain exploring category-category mappings from KB. From this, it identifies domain entities through entity-category mappings. Next, the class identification step discovers implicit classes within the dataset. For that, ANCHOR learns entity representation from entity-category mappings and uses it to identify im- plicit entities’ classes by grouping similar entities. Finally, the class schema discovery task builds the class schema, i.e., it identifies a set of relevant attributes that best describe the entities within the same class. For that, ANCHOR runs CoFFee, an approach based on attributes co-occurrence and frequency to identify a set of core attributes for each class discovered in the previous step. We have performed an extensive experimental evaluation on four distinct DBpedia domains. For the class identification task, we compare ANCHOR against some traditional and embedding-based baselines. The results show that applied to standard clustering algorithms, our entity representation outperforms the baselines and is effective for the class identification task. For the class schema discovery task, we compare CoFFee against two state-of-the-art approaches. The results show that CoFFee proved to be effective in filtering out less relevant attributes. It selects a set of core attributes keep- ing its retrieval rate high and producing a higher-quality schema class for the identified classes.
URI :	https://repositorio.ufpe.br/handle/123456789/51840
Aparece en las colecciones:	Teses de Doutorado - Ciência da Computação

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
TESE Everaldo Costa Silva Neto.pdf		4.11 MB	Adobe PDF	Visualizar/Abrir

Este ítem está protegido por copyright original

Visualizar la licencia

Mostrar el registro Dublin Core completo del ítem Recomiende este ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons